Skip to main content

Extract YouTube Data

The functions in this section are designed to extract detailed information from YouTube videos, including metadata and transcripts. You can benefit from these capabilities to build powerful APIs / tools / applications.

get_video_meta(video_url) Function

This function takes only the video_url as input and it fetches detailed metadata from a specified YouTube video. It retrieves a ton of information returning them in a dictionary format:

  • video_id: The unique identifier for the video.
  • video_title: The title of the video.
  • video_description: A description of the video.
  • video_length: The duration of the video in seconds.
  • video_views: The number of times the video has been viewed.
  • video_author: The creator of the video.
  • video_publish_date: The publication date of the video.
  • video_thumbnail_url: The URL of the video's thumbnail.
  • video_rating: The average user rating of the video.
  • video_keywords: Keywords associated with the video.

Example Usage

from SimplerLLM.tools.youtube import get_video_meta

video_meta = get_video_meta("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_meta)

The video meta is returned in the following format:

{'video_id': 'r9PjzmUmk1w', 'video_title': 'Build SaaS with WordPress With 3 Plugins Only!', 'video_description': None, 'video_length': 252, 'video_views': 25845, 'video_author': 'Hasan Aboul Hasan', 'video_publish_date': datetime.datetime(2024, 2, 15, 0, 0), 'video_thumbnail_url': 'https://i.ytimg.com/vi/r9PjzmUmk1w/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLDLDIEv0NjGrhaAKQ8GL2SpvwDDng', 'video_rating': None, 'video_keywords': []}

Let's say you only want to get the video title, here's how you get it:

from SimplerLLM.tools.youtube import get_video_meta

video_meta = get_video_meta("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_meta.get('video_title'))

Use the same method to extract any value you want.

get_youtube_transcript(video_url) Function

This function also takes only the video_url, and it returns the transcript of the YouTube video, formatting it into a simple readable string.

Example Usage

from SimplerLLM.tools.youtube import get_youtube_transcript

video_transcript = get_youtube_transcript("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_transcript)

get_youtube_transcript_with_timing(video_url) Function

This function also takes only the video_url, and retrieves the transcript of a YouTube video, including timing information for each line. It returns a list of dictionaries, where each dictionary refers to a part of the transcript and it contains the following:

  • text: The transcript text of a specific segment of the video.
  • start: The start time of the segment in seconds.
  • duration: The duration of the segment in seconds.

Example Usage

from SimplerLLM.tools.youtube import get_youtube_transcript_with_timing

video_transcript = get_youtube_transcript_with_timing("https://www.youtube.com/watch?v=r9PjzmUmk1w")

print(video_transcript)

Here's the output format of a small section:

[{'text': 'hi friends in this video I will show you', 'start': 0.12, 'duration': 6.08}, {'text': 'how to turn any WordPress website into a', 'start': 2.639, 'duration': 7.481}, {'text': 'full SAS business using only three', 'start': 6.2, 'duration': 7.639}, {'text': 'plugins this is exactly what I did on my', 'start': 10.12, 'duration': 6.56}, {'text': 'website you will see here I have a list', 'start': 13.839, 'duration': 5.401}]

That's how you can benefit from SimplerLLM to make extracting YouTube data Simpler!